System for reducing delay for execution subsequent to correctly predicted branch instruction using fetch information stored with each block of instructions in cache

ABSTRACT

A super-scaler processor is disclosed wherein branch-prediction information is provided within an instruction cache memory. Each instruction cache block stored in the instruction cache memory includes branch-prediction information fields in addition to instruction fields, which indicate the address of the instruction block&#39;s successor and information indicating the location of a branch instruction within the instruction block. Thus, the next cache block can be easily fetched without waiting on a decoder or execution unit to indicate the proper fetch action to be taken for correctly predicted branching.

BACKGROUND OF THE INVENTION

The present invention relates to a method and apparatus for improvingprocessor performance by reducing processing delays associated withbranch instructions. In particular, the present invention provides aninstruction cache for a super-scalar processor wherein branch-predictioninformation is provided within the instruction cache.

The time taken by a computing system to perform a particular applicationis determined by three basic factors, namely, the processor cycle time,the number of processor instructions required to perform theapplication, and the average number of processor cycles required toexecute an instruction. Overall system performance can be improved byreducing one or more of these factors. For example, the average numberof cycles required to perform an application can be significantlyreduced by employing a multi-processor architecture, i.e., providingmore than one processor to execute separate instructions concurrently.

There are disadvantages, however, associated with the implementation ofa multi-processor architecture. In order to be effective,multi-processing requires an application that can be easily segmentedinto independent tasks to be performed concurrently by the differentprocessors. The requirement for a readily segmented task limits theeffective applicability of multi-processing. Further, the increase inprocessing performance attained via multi-processing in manycircumstances may not offset the additional expense incurred byrequiring multiple processors.

Single-processor hardware architectures that avoid the disadvantagesassociated with multi-processing have been proposed. These so called"super-scalar" processors permit a sustained execution rate of more thanone instruction per processor cycle, as opposed to conventional scalarprocessors which--while capable of handling multiple instructions indifferent pipeline stages in one cycle--are limited to a maximumpipeline capacity of one instruction per cycle. In contrast, asuper-scalar pipeline architecture achieves concurrency betweeninstructions both in different pipeline stages and within the samepipeline stage.

A super-scalar processor that executes more than one instruction percycle, however, can only be effective when instructions can be suppliedat a sufficient rate. It is readily apparent that instruction fetchingcan be a limiting factor in overall system performance if the averagerate of instruction fetching is less than the average rate ofinstruction execution. Providing the necessary instruction bandwidth forsequential instructions is relatively easy, as the instruction fetchercan simply fetch several instructions per cycle. It is much moredifficult, however, to provide sufficient instruction bandwidth in thepresence of non-sequential fetches caused by branches, as the branchesmake the instruction fetching dependent on the results of instructionexecution. Thus, the instruction fetcher can either stall or fetchincorrect instructions when the outcome of a branch is not known.

For example, FIG. 1 illustrates two instruction runs consisting of anumber of instructions occupying four instruction-cache blocks (assuminga four-word cache block) in an instruction cache memory. The firstinstruction run consists of instructions S1-S5 that contain a branch toa second instruction run T1-T4. FIG. 2 illustrates how these instructionruns are sequenced through a four-instruction decoder and atwo-instruction decoder, assuming for purposes of illustration that twocycles are required to determine the outcome of a branch. As would beexpected, the four-instruction decoder provides a higher instructionbandwidth than the two-instruction decoder, but neither providessufficient instruction bandwidth for a super-scalar processor. Asillustrated in FIG. 3, the instruction bandwidth improves dramaticallyif the branch delays are reduced to zero.

The dependency between the instruction fetcher and the execution unitcaused by branches can be reduced by predicting the outcome of thebranch during an instruction fetch without waiting for the executionunit to indicate whether or not the branch should be taken. Branchprediction relies heavily on the fact that the outcome of a branch doesnot change frequently over a given period of time. The instructionfetcher can predict future branch executions using information collectedon the outcome of the previous branch executions performed by theexecution unit.

A conventional method for hardware-branch prediction uses a branchtarget buffer to collect information about the most-recently executedbranches. See, for example, "Branch Prediction Strategies and BranchTarget Buffer Design", by J.K.F. Lee and A.J. Smith, IEEE Computer, Vol.17, pp. 6-22, January, 1984. Typically, the branch target buffer isaccessed using an instruction address, and indicates whether or not theinstruction at that address is a branch instruction. If the instructionis a branch instruction, the branch target buffer indicates thepredicted outcome and the target address.

The hit ratio of a branch target buffer, i.e., the probability that abranch is found in the branch target buffer at the time it is fetched,increases as the size of the branch target buffer increases. FIG. 4 is agraph of the hit ratio for a target branch buffer for selected samplebenchmark programs, and illustrates the necessity of a relatively largebranch target buffer in order to obtain an acceptable predictionaccuracy. Accordingly, it would be desirable to provide an improvedhardware branch prediction architecture that would require less hardwaresupport as compared with a conventional branch target buffer.

SUMMARY OF THE INVENTION

The present invention provides a super-scalar processor whereinbranch-prediction information is provided within an instruction cachememory. Each instruction cache block stored in the instruction cachememory includes branch-prediction information fields in addition toinstruction fields, which indicate the address of the instructionblock's successor and information indicating the location of a branchinstruction within the instruction block. Thus, the next cache block canbe easily fetched without waiting on a decoder or execution unit toindicate the proper fetch action to be taken for correctly predictedbranching.

More specifically, branch predication is accomplished in accordance withthe present invention by loading a plurality of instruction blocks intothe instruction cache memory, wherein each of the instruction blocksincludes a plurality of instructions and instruction fetch information.The instruction fetch information includes an address tag, a branchblock index and a successor index that includes a successor valid bit. Afetch program counter is used to generate and supply a fetch programcounter value to the instruction cache memory in order to prefetch oneof the plurality of instruction blocks stored in the instruction cachememory. The processor determines whether the successor valid bit of theprefetched instruction block is set to a predetermined condition whichindicates that a branch instruction within the prefetched instructionblock is predicted as taken. If the successor valid bit is not set tothe predetermined condition, the fetch program counter value isincremented and supplied to the instruction cache memory to prefetch asucceeding instruction block. If the successor valid bit is set to thepredetermined condition, a predicted target branch address is generatedby the instruction cache memory based on information contained in theinstruction fetch information field associated with the instructionblock. The predicted target branch address and the branch location ofthe branch instruction within the instruction cache memory is thenstored in a branch prediction memory. The branch instruction issubsequently executed with a branch execution unit which generates anactual branch location address and a target branch address for theexecuted branch instruction. The actual branch location and the targetbranch address are then respectively compared with the branch locationand predicted target branch address stored in the branch predictionmemory. A misprediction signal is generated if the compared values arenot equal, and the successor valid bit and instruction fetch informationare updated for the instruction block in response to mispredictionsignal.

The utilization of the instruction cache and branch prediction memory asdescribed above, provides branch prediction accuracy substantiallyidentical to that of a target branch buffer without requiring as muchhardware support.

BRIEF DESCRIPTION OF THE DRAWINGS

With the above as background, reference should now be made to thefollowing detailed description of the preferred embodiments inconjunction with the drawings, in which:

FIG. 1 shows a sequence of two instruction runs to illustrate decoderbehavior;

FIG. 2 illustrates the sequencing of the instruction runs shown in FIG.1 through a two-instruction and four-instruction decoder;

FIG. 3 illustrates the improvements in instruction bandwidth for theinstruction runs illustrated in FIG. 2 if branch delays are avoided;

FIG. 4 is a graph of the hit ratio of a target branch buffer;

FIG. 5 illustrates a preferred layout for an instruction-cache entry inaccordance with the present invention;

FIG. 6 an example of instruction-cache entries for the code sequenceillustrated in FIG. 3;

FIG. 7 is a block diagram of a super-scalar processor according to thepresent invention;

FIG. 8 is a block diagram of an instruction cache employed in thesuper-scalar processor illustrated in FIG. 7;

FIG. 9 is a block diagram of a branch prediction FIFO employed in thesuper-scalar processor illustrated in FIG. 7; and

FIG. 10 block diagram of a branch execution unit employed in thesuper-scalar processor illustrated in FIG. 7.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

The basic operation of an instruction cache for a super-scalar processorin accordance with the present invention will be discussed withreference to FIG. 5, which illustrates a preferred layout for aninstruction-cache entry required by the super-scalar processor. In theexample illustrated, the cache entry holds four instructions andinstruction fetch information which is shown in expanded form to includea conventional address tag field and two additional fields: a successorindex field which indicates both the next entry predicted to be fetchedand the first instruction within the next entry predicted to beexecuted, and a branch block index field which indicates the location ofa branch point within the instruction block. The successor index fielddoes not specify a full instruction address, but is of sufficient sizeto select any instruction address within the instruction cache. Thesuccessor index field includes a successor valid bit that indicates abranch is predicted to be taken when set, and that a branch is notpredicted to be taken when cleared.

FIG. 6 illustrates instruction-cache entries for the code sequence shownin FIG. 3, assuming a 64 Kbyte direct-mapped cache and the indicatedinstruction address. When a cache entry is first loaded, the address tagis set and the successor valid bit is cleared. The default for anewly-loaded entry, therefore, is to predict that a branch is not takenand the next sequential instruction block is to be fetched. FIG. 6 alsoillustrates that a branch target program counter can be constructed atbranch points by concatenating the successor index field of theinstruction block where the branch occurs to the address tag of thesuccessor instruction block.

The validity of instructions at the beginning of a current instructionblock are preferably determined by the low-order bits of the successorindex field in the preceding instruction block. The successor index ofthe preceding instruction block may point to any instruction within thecurrent instruction block, and instructions up to this point in thecurrent instruction block are not executed by the processor. Thevalidity of instructions at the end of the block are determined by thebranch block index, which indicates the point where a branch ispredicted to be taken The branch block index is required by aninstruction decoder to determine valid instructions, while cache entriesare retrieved based on the successor index fields alone.

To check branch predictions, the processor keeps a list of predictedbranches, stored in the order in which the branches are predicted, in abranch prediction FIFO associated with the instruction cache. Each entryon the list indicates the location of the branch in the instructioncache, which is identified by concatenating the successor index of theentry preceding the branching entry with the branch location indexfield. Each entry also contains a complete program-counter value for thetarget of the branch.

The processor executes all branches in their original program sequencewith a branch execution unit, and compares information resulting fromthe execution of the branches with information at the head of the listof predicted branches. The following conditions must hold for asuccessful branch prediction. First, if the branch is taken, itslocation in the instruction cache must match the location of the nextbranch on the list contained in the branch prediction FIFO. Thiscondition is required to detect a taken branch that was predicted to benot taken. Secondly, the predicted target address of the branch at thehead of the list must match the next instruction address determined byexecuting the branch.

The second comparison is relevant only if the locations match, and isrequired primarily to detect a branch which was not taken that waspredicted to be taken. However, as the predicted target address is basedon the address tag of the successor block, this comparison also detectsthat cache replacement during execution has removed the original targetentry. In addition, comparing program-counter values checks thatindirect branches were properly predicted.

The branch is mispredicted if either or both of the above-describedconditions does not hold. When a misprediction occurs, the appropriatecache entry must be fetched using the location of the branch determinedby the execution unit. The successor valid bit and instruction fetchinformation for the incorrect instruction block must also be updatedbased on the misprediction to reflect the actual result of the executionof the branch. For example, the successor valid bit is cleared if abranch had been predicted as taken but was not taken, so that on thenext fetch of the instruction block the branch will be predicted as nottaken. Thus, the successor valid bit and instruction fetch informationalway reflect the actual result of the previous execution of the branchinstruction.

With the above as background, reference should now be made to FIG. 7 fora detailed description of a preferred embodiment of the invention. FIG.7 illustrates a block diagram of a super-scalar processor that includesa bus interface unit (BIU) 10, an instruction cache 12, a branchprediction FIFO 14, an instruction decoder 16, a register file 18, areorder buffer 20, a branch execution unit 22, an arithmetic logic unit(ALU) 24, a shifter unit 30, a load unit 32 a store unit 33, and a datacache 34.

The reorder buffer 20 is managed as a FIFO. When an instruction isdecoded by the instruction decoder 16, a corresponding entry isallocated in the reorder buffer 20. The result value of the decodedinstruction is written into the allocated entry when the execution ofthe instruction is completed. The result value is then written into theregister file 18 if there are no exceptions associated with theinstruction. If the instruction is not complete when its associatedentry reaches the head of the reorder buffer 20, the advancement of thereorder buffer 20 is halted until the instruction is completed,additional entries, however, can continue to be allocated. If there isan exception or branch misprediction, the entire contents of the reorderbuffer 20 are discarded.

As illustrated in FIG. 8, the instruction cache 12 includes aninstruction store array 36 which is a direct mapped instruction cacheorganized as 512 instruction blocks of four words each, a tag array 38having 512 entries composed of a 19 bit tag and a single valid bit forthe entire block, a dual ported successor array 40 having 512 entriescomposed of an 11 bit successor index and a successor valid bit whichindicates when set that the successor index stored in the successorarray . .340.!. .Iadd.40 .Iaddend.should be used to access theinstruction store array 36, and indicates when cleared that no branch ispredicted within the instruction block, a dual ported block status array42 that contains a branch block indicator for each instruction block inthe instruction cache 12 which indicates the last instruction predictedto be executed within a block, a fetch program counter (PC) 44(including a PC latch 46, a MUX unit 48 and an incrementer (INC) 50)that generates a PC value that is used for prefetching the instructionstream from the instruction cache 12, an instruction fetch control unit52 that controls the fetching of instructions from the instruction cache12, the replacement of cache blocks on misses, and the reformatting ofthe successor array 40 and branch block array 42 on branches that aremispredicted, and an instruction register latch 54 which is loaded withthe instructions to be provided to the instruction decoder 16.

The branch prediction FIFO 14 is used to maintain information related toevery predicted branch within an instruction block. Specifically, thelocation in the cache where the branch is predicted to occur (i.e. thebranch location) as well as the predicted branch target PC of the branchare stored within the branch prediction FIFO 14. As illustrated in FIG.9, the branch prediction FIFO 14 is preferably implemented as a fixedarray with a target PC FIFO and a branch location FIFO, incrementingread/write pointers 56 and 58, and also includes a target PC comparator60 and a branch location comparator 62 which are respectively coupled toa branch location data bus (CPC) and a target PC data bus (TPC). Theoutput signals generated by the target PC comparator 60 and the branchlocation PC comparator 62 are provided to a branch FIFO control circuit63. The FIFO 14 could alternatively be implemented as a shiftable arrayor a circular FIFO.

The branch execution unit 22 contains the hardware that actuallyexecutes the branch instructions and writes the branch results back tothe reorder buffer 18 As shown in FIG. 10, the branch execution unit 22includes a branch reservation station 62, a branch computation unit 64and a result bus interface 66. The reservation station 62 is a FIFOarray which receives decoded instructions from the instruction decoder16 and operand information from the register file 18 and reorder buffer20 and holds this information until the decoded instruction is free fromdependencies and the branch computation unit 64 is free to execute theinstruction. The result bus interface 66 couples the branch executionunit 22 to the CPC bus and TPC bus, which in turn are coupled to thebranch location comparator 62 and the target PC comparator 60 of thebranch predication FIFO 14 as illustrated in FIG. 9.

In operation, the instruction cache 12 is loaded with instructions froman instruction memory via the BIU 10. The fetch PC 44 supplies apredicted fetch PC value to the instruction cache 12 in order toprefetch an instruction stream. As previously stated, the successorvalid bit for each instruction block is cleared when the instructionblock is first loaded into the instruction cache 12. Thus, when a giveninstruction block is first fetched from the instruction cache 12, anybranch in the block is predicted as not taken. The prefetchedinstruction block is supplied to the instruction decoder 16 via theinstruction decode latch 54. The predicted fetch PC is then incrementedvia the incrementer 50 and loaded back into the fetch PC latch 46 viathe MUX unit 48. The resulting fetch PC is then supplied to theinstruction cache 12 in order to fetch the next sequential instructionblock in the instruction store.

The branch execution unit 22 processes any branch instruction containedin the first prefetched instruction block, and generates an actual PCvalue and target PC value for the executed branch instruction. Note,that if the branch is not taken on execution, the target PC valuegenerated by the branch execution unit 22 will be the next sequentialvalue after the actual PC value, i.e., the term "target PC" in thissense does not necessarily mean the target of an executed branch, butinstead indicates the address of the next instruction block to beexecuted regardless of the branch results. The actual PC value and thetarget PC value are respectively supplied to the CPC bus and the TPC busand loaded into the branch location comparator and the target PCcomparator in the branch prediction FIFO.

Where a branch was predicted .Iadd.as .Iaddend.not taken but was takenon execution, the comparison of the actual PC value supplied by thebranch instruction unit 22 with the branch location value supplied fromthe branch location FIFO of the branch prediction FIFO 14 will fail. Thebranch prediction FIFO 14 resets and generates a branch mispredictionsignal which is supplied to the instruction fetch control unit of theinstruction cache 12. The target PC from the branch execution unit 22 isthen loaded into the fetch PC latch 46 via the MUX unit 48 and thesuccessor array is updated to set the successor valid bit under controlof the instruction fetch control circuit 52. Thus, the branch will bepredicted as taken on subsequent fetches of the instruction block.

When the successor valid bit is set indicating a branch is predicted astaken, the value of the fetch PC latch is loaded into the next availableentry in the branch prediction FIFO. A reconstructed predicted fetch PCformed from the successor index and the tag field read out of the tagarray is loaded via the MUX 48 into the fetch PC latch 46. Thisreconstructed fetch PC is supplied to the instruction store array 36 tofetch the next instruction and to the branch prediction FIFO. Thus, thebranch prediction FIFO entry contains the branch location of the branchas well as the predicted target of the branch.

The branch execution unit 22 subsequently executes the branchinstruction and generates an actual PC value and a target PC value whichare supplied to the branch location comparator and the target PCcomparator in the branch prediction FIFO. If the branch was predicted tobe taken, the PC value generated by the branch execution unit 22 willalways match the branch location loaded from the branch location FIFO.Three possible conditions, however, will result in the target PC valuegenerated by the branch execution unit 22 not matching the target PCstored in the branch prediction FIFO 14: the branch was predicted astaken but was not taken in which case the successor valid bit must becleared, the branch executed a subroutine return to an address which didnot match the predicted address thereby requiring the successor index beupdated, or cache replacement occurred prior to the execution of thebranch instruction requiring the reloading of the instruction cache.

The principal hardware cost of the above-described branch predictionscheme is the increase in the cache size caused by the successor indexand branch block index fields associated with each entry in theinstruction cache. This increase is minimal when compared with otherhardware prediction schemes, however, as the present invention savesstorage space by predicting only one taken branch per cache block, andpredicting non-taken branches by not storing any branch informationassociated with the instruction block into the successor index. For an 8Kbyte direct mapped cache, the additional fields add about 8% to thecache storage required. The increase in overall system performance dueto branch prediction, however, justifies the increased size requirementfor the instruction cache.

The requirement for updating the cache entry when a branch ismispredicted does conflict with the requirement to fetch the correctbranch target, i.e., unless it is possible to read and write the fetchinformation for two different entries simultaneously, the updating ofthe fetch information on a mispredicted branch takes a cycle away frominstruction fetching. The requirement for an additional cycle causesonly a small degradation in performance, however, as mispredictedbranches occur infrequently and the increase in performance associatedwith branch prediction easily outweigh any degradation in performancedue to the additional cycles required mispredicted branches.

The invention has been described with particular reference to certainpreferred embodiments thereof. The invention is not limited to thesedisclosed embodiments and modifications and variations may be madewithin the scope of the appended claims.

What is claimed is:
 1. A branch prediction method.Iadd.,.Iaddend.comprising .Iadd.the steps of.Iaddend.:a. loading a pluralityof instruction blocks into an instruction cache memory, each of saidinstruction blocks comprising a plurality of instructions andinstruction fetch information, wherein said instruction fetchinformation comprises an address tag, a predicted target branch address,a branch block index and a successor index that includes a successorvalid bit; b. generating and supplying a fetch program counter.Iadd.value .Iaddend.to said instruction cache memory in order toprefetch one of said plurality of instruction blocks . .and store.!..Iadd.stored .Iaddend.in said instruction cache memory; c. determiningwhether said successor valid bit of said prefetched instruction block isset to a predetermined condition which indicates that a branchinstruction within said prefetched instruction block is predicted astaken; d. . .incrementing said fetch program counter and supplying theincremented fetch program counter value to said instruction cache memoryto prefetch a succeeding instruction block if said successor valid bitis not set to said predetermined condition, and.!. generating a branchlocation address indicative of the location of said branch instructionwithin said instruction . .memory.!. cache .Iadd.memory .Iaddend.and apredicted target branch address if said successor valid bit is set tosaid predetermined condition; e. storing said predicted target branchaddress and said branch location address in a branch prediction memory.Iadd.if said successor valid bit is set to said predeterminedcondition.Iaddend.; f. .Iadd.incrementing said fetch program countervalue and supplying the incremented fetch program counter value to saidinstruction cache memory to prefetch a succeeding instruction block ifsaid successor valid bit is not set to said predetermined condition; g..Iaddend.executing said branch instruction with an execution unit andgenerating an actual branch address and a target branch address for theexecuted branch instruction; . .g..!. .Iadd.h. .Iaddend.comparing saidactual .Iadd.branch .Iaddend.address generated by said execution unitwith said branch location .Iadd.address .Iaddend.stored in said branchprediction memory and generating a .Iadd.first .Iaddend.mispredictionsignal if .Iadd.a branch corresponding to said branch instruction wastaken on execution and either .Iaddend.said actual .Iadd.branch.Iaddend.address is not equal to said branch location .Iadd.address orsaid executed target branch address is not equal to said predictedtarget branch address stored in said branch prediction memory.Iaddend.;. .h..!. .Iadd.i. .Iaddend.comparing . .the executed target.!..Iadd.said actual .Iaddend.branch address with . .the predicted.!..Iadd.said .Iaddend.branch .Iadd.location .Iaddend.address stored insaid branch prediction memory and generating a .Iadd.second.Iaddend.misprediction signal if . .the executed target.!. .Iadd.saidbranch corresponding to said branch instruction was not taken onexecution and said actual .Iaddend.branch address is . .not.!. equal to. .the predicted target.!. .Iadd.said .Iaddend.branch .Iadd.location.Iaddend.address; . .i..!. .Iadd.j. .Iaddend.updating the successorvalid bit and instruction fetch information for said instruction blockin response to said .Iadd.first or second .Iaddend.misprediction signal;and . .j..!. .Iadd.k. .Iaddend.updating said .Iadd.fetch.Iaddend.program counter value with the target branch address .Iadd.inresponse to said first or second misprediction signal.Iaddend..
 2. Amethod as set forth in claim 1, wherein said predicted target branchaddress is generated by concatenating said successor index of saidprefetched instruction block to an address tag of a successorinstruction block.
 3. A method as set forth in claim 2, wherein saidbranch location .Iadd.address .Iaddend.is generated by concatenating asuccessor index from a preceding instruction block . .with the branchlocation address.!. .Iadd.to an address tag .Iaddend.of said prefetchedinstruction block.
 4. An apparatus comprising:a. first means for storinga plurality of instruction blocks, each of said instruction blockscomprising a plurality of instructions and instruction fetchinformation, wherein said instruction fetch information comprises anaddress tag, a predicted target branch address, a branch block index anda successor index that includes a successor valid bit; b. second meansfor generating and supplying a fetch program counter value to said firstmeans in order to prefetch one of said plurality of instruction blocks ..and store.!. .Iadd.stored .Iaddend.in said first means; c. third meansfor determining whether said successor valid bit of said prefetchedinstruction block is set to a predetermined condition which indicatesthat a branch instruction within said prefetched instruction block ispredicted as taken; d. fourth means for . .incrementing said fetchprogram counter and supplying the incremented fetch program countervalue to said first means to prefetch a succeeding instruction block ifsaid successor valid bit is not set to said predetermined condition; e.fifth means for.!. generating a branch location address and a predictedtarget branch address if said successor valid bit is set to saidpredetermined condition; . .f. sixth.!. .Iadd.e. fifth .Iaddend.meansfor storing said predicted target branch address and said branchlocation address .Iadd.if said successor valid bit is set to saidpredetermined condition.Iaddend.; . .g. seventh.!. .Iadd.f. sixth.Iaddend.means for .Iadd.incrementing said fetch program counter valueand supplying the incremented fetch program counter value to saidinstruction cache memory to prefetch a succeeding instruction block ifsaid successor valid bit is not set to said predetermined condition; g.seventh means for .Iaddend.executing said branch instruction andgenerating an actual branch address and a target branch address for theexecuted branch instruction; h. eighth means for comparing said actual.Iadd.branch .Iaddend.address generated by said seventh means with saidbranch location .Iadd.address .Iaddend.stored in said sixth means and.Iadd.generating a first misprediction signal if a branch correspondingto said branch instruction was taken on execution and either said actualbranch address is not equal to said branch location address or saidexecuted target branch address is not equal to said predicted branchaddress stored in said sixth means; i. ninth means .Iaddend.forcomparing . .the executed target.!. .Iadd.said actual .Iaddend.branchaddress with . .the predicted.!. .Iadd.said .Iaddend.branch.Iadd.location .Iaddend.address stored in said . .branch predictionmemory.!. .Iadd.sixth means .Iaddend.and generating a .Iadd.second.Iaddend.misprediction signal . .based on the result of saidcomparisons.!. .Iadd.if said branch corresponding to said branchinstruction was not taken on execution and said actual branch address isequal to said branch location address.Iaddend.; . .i. ninth means.!..Iadd.j. tenth means .Iaddend.for updating the successor valid bit andinstruction fetch information for said instruction block in response tosaid .Iadd.first or second .Iaddend.misprediction signal; and . .j..!..Iadd.k. eleventh means for .Iaddend.updating said .Iadd.fetch.Iaddend.program counter value with the target branch address .Iadd.inresponse to said first or second misprediction signal.Iaddend..
 5. Anapparatus as claimed in claim 4, wherein said . .seventh.!. .Iadd.fourth.Iaddend.means generates said predicted target branch address byconcatenating said successor index of said prefetched instruction blockto an address tag of a successor instruction block.
 6. A method as setforth in claim 4, wherein said . .seventh.!. .Iadd.fourth .Iaddend.meansgenerates said branch location .Iadd.address .Iaddend.by concatenating asuccessor index from a preceding instruction block . .with the branchlocation address.!. .Iadd.to an address tag .Iaddend.of said prefetchedinstruction block.
 7. An apparatus comprising:a bus interface unit, aninstruction cache memory coupled to said bus interface unit andconfigured to receive a plurality of instruction blocks, each of saidinstruction blocks comprising a plurality of instructions andinstruction fetch information, wherein said instruction fetchinformation comprises an address tag, a branch block index and asuccessor index that includes a successor valid bit; a branch predictionmemory coupled to said instruction cache memory; an instruction decodercoupled to said instruction cache memory, . .an instruction branchmemory coupled to said instruction cache memory,.!. wherein when saidsuccessor valid bit is not set to a predetermined condition.Iadd.,.Iaddend.a fetch program counter value is incremented and supplied tosaid instruction cache memory for prefetching a succeeding instructionblock, and when said successor valid bit is set to the predeterminedcondition, a predicted target branch address is generated by saidinstruction cache memory based on information contained in saidinstruction fetch information and said predicted target branch addresswithin the instruction cache memory is stored in said branch prediction. .said.!. memory; and a processing unit including a branch executionunit coupled to said instruction decoder and a register file, whereinsaid branch instruction is subsequently executed with said branchexecution unit which generates an actual branch location address and atarget branch address for said executed branch instruction and saidactual branch location .Iadd.address .Iaddend.and the target branchaddress are respectively compared with the branch location .Iadd.address.Iaddend.and said predicted target branch address stored in the branchprediction memory, generating a misprediction signal if .Iadd.saidbranch instruction was taken on execution and .Iaddend.the comparedvalues are not equal, and said successor valid bit and said instructionfetch information being updated for the instruction block in response tothe misprediction signal and updating said .Iadd.fetch .Iaddend.programcounter value with the target branch address .Iadd.in response to themisprediction signal.Iaddend..
 8. An apparatus as claimed in claim 7,wherein said instruction cache memory includes an instruction storearray coupled to said bus interface unit, a tag array coupled to saidinstruction store array, a successor array coupled to said tag array,and a block status array coupled to said successor array.
 9. Anapparatus as claimed in claim 8, wherein said instruction cache memoryfurther comprises a fetch program counter that includes a PC latch, anincrementer, and a MUX unit.
 10. An apparatus as claimed in claim 9,wherein said instruction cache memory further comprises an instructionfetch control circuit coupled to said fetch program counter, whereinsaid instruction fetch control circuit controls the operation of saidMux unit to selectively load the PC latch with a value generated by saidincrementer, a value supplied by said branch . .control.!..Iadd.execution .Iaddend.unit, or a reconstructed fetch PC value.
 11. Anapparatus as claimed in claim 7, wherein said branch prediction memorycomprises a branch target FIFO and a branch location FIFO.
 12. Anapparatus as claimed in claim 11, wherein said branch prediction memoryfurther comprises a target PC comparator coupled to said branch targetFIFO and a bus that is coupled to said branch execution unit, and abranch location comparator coupled to said branch location FIFO and abus that is coupled to said branch execution unit, wherein the output ofsaid target PC comparator and said branch location comparator arecoupled to a control circuit. .Iadd.
 13. A branch prediction methodcomprising the steps of:a. loading a plurality of instruction blocksinto an instruction cache memory, each of said instruction blockscomprising a plurality of instructions and instruction fetchinformation, wherein said instruction fetch information comprises asuccessor index indicative of a predicted target branch address and asuccessor valid bit; b. generating and supplying a fetch program countervalue to said instruction cache memory in order to prefetch one of saidplurality of instruction blocks stored in said instruction cache memory;c. determining whether said successor valid bit of said prefetchedinstruction block is set to a predetermined condition which indicatesthat a branch instruction within said prefetched instruction block ispredicted as taken; d. generating a branch location address indicativeof the location of said branch instruction within said instruction cachememory and a predicted target branch address if said successor valid bitis set to said predetermined condition; e. storing said predicted targetbranch address and said branch location address in a branch predictionmemory if said successor valid bit is set to said predeterminedcondition; f. incrementing said fetch program counter value andsupplying the incremented fetch program counter value to saidinstruction cache memory to prefetch a succeeding instruction block ifsaid successor valid bit is not set to said predetermined condition; g.executing said branch instruction with an execution unit and generatingan actual branch address and a target branch address for the executedbranch instruction; h. comparing said actual branch address generated bysaid execution unit with said branch location address stored in saidbranch prediction memory and generating a first misprediction signal ifsaid branch instruction was taken on execution and either said actualbranch address is not equal to said branch location address or saidexecuted target branch address is not equal to said predicted targetbranch address stored in said branch prediction memory; i. comparingsaid actual branch address with said branch location address stored insaid branch prediction memory and generating a second mispredictionsignal if said branch instruction was not taken and said actual branchaddress is equal to said branch location address; j. updating thesuccessor valid bit and instruction fetch information for saidinstruction block in response to said first or second mispredictionsignal; and k. updating said fetch program counter value with the targetbranch address in response to said first or second mispredictionsignal..Iaddend..Iadd.
 14. A method as set forth in claim 13, whereinsaid instruction fetch information further comprises an address tag andwherein said predicted target branch address is generated byconcatenating said successor index of said prefetched instruction blockto an address tag of a successor instruction block..Iaddend..Iadd.15. Amethod as set forth in claim 14, wherein said branch location address isgenerated by concatenating a successor index from a precedinginstruction block to an address tag of said prefetched instructionblock..Iaddend..Iadd.16. An apparatus comprising:a. first means forstoring a plurality of instruction blocks, each of said instructionblocks comprising a plurality of instructions and instruction fetchinformation, wherein said instruction fetch information comprises asuccessor index indicative of a predicted target branch address and asuccessor valid bit; b. second means for generating and supplying afetch program counter value to said first means in order to prefetch oneof said plurality of instruction blocks stored in said first means; c.third means for determining whether said successor valid bit of saidprefetched instruction block is set to a predetermined condition whichindicates that a branch instruction within said prefetched instructionblock is predicted as taken; d. fourth means for generating a branchlocation address and a predicted target branch address if said successorvalid bit is set to said predetermined condition; e. fifth means forstoring said predicted target branch address and said branch locationaddress if said successor valid bit is set to said predeterminedcondition; f. sixth means for incrementing said fetch program countervalue and supplying the incremented fetch program counter value to saidfirst means to prefetch a succeeding instruction block if said successorvalid bit is not set to said predetermined condition; g. seventh meansfor executing said branch instruction and generating an actual branchaddress and a target branch address for the executed branch instruction;h. eighth means for comparing said actual branch address generated bysaid seventh means with said branch location address stored in saidsixth means and generating a first misprediction signal if a branchcorresponding to said branch instruction was taken on execution andeither said actual branch address is not equal to said branch locationaddress or said executed target branch address is not equal to saidpredicted target branch address stored in said fifth means; i. ninthmeans for comparing said actual branch address with said branch locationaddress stored in said sixth means and generating a second mispredictionsignal if said branch instruction was not taken on execution and saidactual branch address is equal to said branch location address; j. tenthmeans for updating the successor valid bit and instruction fetchinformation for said instruction block in response to said first orsecond misprediction signal; and k. eleventh means for updating saidfetch program counter value with the target branch address in responseto said first or second misprediction signal..Iaddend..Iadd.17. Anapparatus as claimed in claim 16, wherein said instruction fetchinformation further comprises an address tag and wherein said fourthmeans generates said predicted target branch address by concatenatingsaid successor index of said prefetched instruction block to an addresstag of a successor instruction block..Iaddend..Iadd.18. A method as setforth in claim 16, wherein said instruction fetch information furthercomprises an address tag and wherein said fourth means generates saidbranch location address by concatenating a successor index from apreceding instruction block to an address tag of said prefetchedinstruction block..Iaddend..Iadd.19. An apparatus comprising:aninstruction cache memory configured to receive a plurality ofinstruction blocks, each of said instruction blocks comprising aplurality of instructions and instruction fetch information, whereinsaid instruction fetch information comprises a successor indexindicative of a predicted target branch address and a successor validbit; a branch prediction memory coupled to said instruction cachememory; an instruction decoder coupled to said instruction cache memory,wherein when said successor valid bit is not set to a predeterminedcondition, a fetch program counter value is incremented and supplied tosaid instruction cache memory for prefetching a succeeding instructionblock, and when said successor valid bit is set to the predeterminedcondition, a predicted target branch address is generated for a branchlocation address by said instruction cache memory based on informationcontained in said instruction fetch information, and wherein saidpredicted target branch address and said branch location address arestored in said branch prediction memory; and a processing unit includinga branch execution unit coupled to said instruction decoder, whereinsaid branch instruction is subsequently executed by said branchexecution unit which generates an actual branch location address and atarget branch address for said executed branch instruction and saidactual branch location address and the target branch address arerespectively compared with the branch location address and saidpredicted target branch address stored in the branch prediction memory,generating a misprediction signal if a branch corresponding to saidbranch instruction was taken on execution and the compared values arenot equal, and said successor index being updated for the instructionblock in said instruction cache memory in response to the mispredictionsignal and updating said fetch program counter value with the targetbranch address in response to said misprediction signal..Iaddend..Iadd.An apparatus as claimed in claim 19, wherein said instruction cachememory includes an instruction store array, a tag array coupled to saidinstruction store array, a successor array coupled to said tag array,and a block status array coupled to said successorarray..Iaddend..Iadd.21. An apparatus as claimed in claim 20, whereinsaid instruction cache memory further comprises a fetch program counterthat includes a PC latch, an incrementer, and a MUXunit..Iaddend..Iadd.22. An apparatus as claimed in claim 21, whereinsaid instruction cache memory further comprises an instruction fetchcontrol circuit coupled to said fetch program counter, wherein saidinstruction fetch control circuit controls the operation of said MUXunit to selectively load the PC latch with a value generated by saidincrementer, a value supplied by said branch control unit, or areconstructed fetch PC value..Iaddend..Iadd.23. An apparatus as claimedin claim 19, wherein said branch prediction memory comprises a branchtarget FIFO and a branch location FIFO..Iaddend..Iadd.24. An apparatusas claimed in claim 23, wherein said branch prediction memory furthercomprises a target PC comparator coupled to said branch target FIFO anda bus that is coupled to said branch execution unit, and a branchlocation comparator coupled to said branch location FIFO and a bus thatis coupled to said branch execution unit, wherein the output of saidtarget PC comparator and said branch location comparator are coupled toa control circuit..Iaddend..Iadd.25. An apparatus for prefetching branchinstructions for a processor, comprising:a. first means for storing aplurality of instruction blocks, each of said instruction blockscomprising a plurality of instructions and instruction fetchinformation, wherein said instruction fetch information comprises anindex field indicating a succeeding instruction block predicted to befetched and a branch/no branch prediction; b. second means forgenerating and supplying a fetch program counter value to said firstmeans in order to prefetch one of said plurality of instruction blocksstored in said first means as a prefetched instruction block; c. thirdmeans for reading said instruction fetch information of said prefetchedinstruction block and incrementing said fetch program counter value andsupplying said incremented fetch program counter value to said firstmeans if said branch/no branch prediction stored within said instructionfetch information of said prefetched instruction block indicates a nobranch condition, and updating said fetch program counter value withsaid succeeding instruction block stored in said instruction fetchinformation of said prefetched instruction block if said branch/nobranch prediction stored within said instruction fetch information ofsaid prefetched instruction block indicates a branch condition; d.fourth means for storing a branch location address and a correspondingpredicted target branch address if said branch/no branch predictionstored within said instruction fetch information of said prefetchedinstruction block indicates said branch condition; e. fifth means forexecuting a branch instruction contained in said prefetched instructionblock and generating an actual target branch address as a result of saidexecution of said branch instruction; f. sixth means for comparing saidactual target branch address with said predicted target branch addresscorresponding to said branch instruction stored in said fourth means,wherein when a branch corresponding to said branch instruction was takenon execution and said comparison result indicates that said branchlocation address stored in said fourth means corresponds to said branchinstruction executed by said fifth means and said predicted targetbranch address is not equivalent to said actual target branch address,sending a first update signal to said first means to replace said indexfield with said actual target branch address; and g. seventh means forcomparing said branch location address stored in said fourth means withan address of said branch instruction executed by said fifth means andfor sending a second update signal to said first means to update saidbranch/no branch prediction to said no branch condition if said branchcorresponding to said branch instruction was not taken on execution andsaid comparison result indicates that said address of said branchinstruction is equal to said branch location address stored in saidfourth means..Iaddend..Iadd.26. A method of prefetching branchinstructions for a processor, comprising the steps of: a. loading aplurality of instruction blocks into an instruction cache memory,wherein each of said instruction blocks comprises a plurality ofinstructions and instruction fetch information, wherein said instructionfetch information comprises an index field indicating a succeedinginstruction block predicted to be fetched and a branch/no branchprediction; b. generating and supplying a fetch program counter value tosaid instruction cache memory in order to prefetch one of said pluralityof instruction blocks as a prefetched instruction block; c. reading saidinstruction fetch information of said prefetched instruction block andincrementing said fetch program counter value if said branch/no branchprediction stored within said instruction fetch information of saidprefetched instruction block indicates a no branch condition, andupdating said fetch program counter value with said succeedinginstruction block stored in said instruction fetch information of saidprefetched instruction block if said branch/no branch prediction storedwithin said instruction fetch information of said prefetched instructionblock indicates a branch condition; d. storing a branch location addressand a corresponding predicted target branch address in a branchprediction memory if said branch/no branch prediction stored within saidinstruction fetch information of said prefetched instruction blockindicates said branch condition; e. executing a branch instructioncontained in said prefetched instruction block and generating an actualtarget branch address as a result of said execution of said branchinstruction; f. comparing said actual target branch address with saidpredicted target branch address corresponding to said branch instructionstored in said branch prediction memory, wherein when a branchcorresponding to said branch instruction was taken on execution and saidcomparison result indicates that said branch location address stored insaid branch prediction memory corresponds to said executed branchinstruction and said predicted target branch address is not equivalentto said actual target branch address, sending a first update signal tosaid instruction cache memory to replace said index field with saidactual target branch address for said corresponding branch instruction;and g. comparing said branch location address stored in said branchprediction memory with an address of said executed branch instructionand for sending a second update signal to said instruction cache memoryto update said branch/no branch prediction to said no branch conditionif said branch corresponding to said branch instruction was not taken onexecution and said comparison result indicates that said address of saidbranch instruction is equal to said branch location address stored insaid branch prediction memory..Iaddend..Iadd.27. An apparatus forprefetching instructions for a processor, comprising:a. an instructioncache memory configured to receive a plurality of instruction blocks,each of said instruction blocks comprising a plurality of instructionsand instruction fetch information, wherein said instruction fetchinformation comprises an index field indicating a succeeding instructionblock predicted to be fetched and a branch/no branch prediction; b. afetch program counter operatively connected to said instruction cachememory to prefetch one of said plurality of instruction blocks stored insaid instruction cache memory as a prefetched instruction block based ona fetch program counter value supplied to said instruction cache memory:c. an instruction fetch control unit operatively connected to said fetchprogram counter and said instruction cache memory for reading saidinstruction fetch information of said prefetched instruction block,wherein said instruction fetch control unit sends a signal to said fetchprogram counter to increment and supply said fetch program counter valueto said instruction cache memory if said branch/no branch predictionstored within said instruction fetch information of said prefetchedinstruction block indicates a no branch condition, and wherein saidinstruction fetch control unit sends a signal to said fetch programcounter to update said fetch program counter value with said succeedinginstruction block stored in said instruction fetch information of saidprefetched instruction block if said data representing said branch/nobranch prediction stored within said instruction fetch information ofsaid prefetched instruction block indicates a branch condition; d. abranch prediction memory coupled to said instruction cache memory forstoring a branch location address and a corresponding predicted targetbranch address if said data representing said branch/no branchprediction stored within said instruction fetch information of saidprefetched instruction block indicates said branch condition; e. anexecution unit coupled to said branch prediction memory, wherein whensaid branch instruction is executed by said execution unit, an actualtarget branch address is generated, and when a branch corresponding tosaid branch instruction is taken on execution, said actual target branchaddress is compared to said predicted target branch address storedwithin said branch prediction memory and said branch location address iscompared with an address of said branch instruction executed by saidexecution unit, and wherein said index field of said instruction cachememory is updated with said actual target branch address if said actualtarget branch address is not equivalent to said predicted target branchaddress or if said branch location address is not equivalent to saidaddress of said branch instruction executed by said execution unit,andwherein when execution of said branch instruction by said executionunit results in said branch corresponding to said branch instruction notbeing taken, said address of said branch instruction executed by saidexecution unit is compared with said branch location address stored insaid branch prediction memory and said branch/no branch predictionstored in said instruction cache memory is updated to indicate a nobranch condition if said address of said branch instruction executed bysaid execution unit is equivalent to said branch location address storedin said branch prediction memory..Iaddend.